在本文中,我们通过神经生成编码的神经认知计算框架(NGC)提出了一种无反向传播的方法,以机器人控制(NGC),设计了一种完全由强大的预测性编码/处理电路构建的代理,体现计划的原则。具体而言,我们制作了一种自适应剂系统,我们称之为主动预测性编码(ACTPC),该系统可以平衡内部生成的认知信号(旨在鼓励智能探索)与内部生成的仪器信号(旨在鼓励寻求目标行为)最终学习如何使用现实的机器人模拟器(即超现实的机器人套件)来控制各种模拟机器人系统以及复杂的机器人臂,以解决块提升任务并可能选择问题。值得注意的是,我们的实验结果表明,我们提出的ACTPC代理在面对稀疏(外部)奖励信号方面表现良好,并且具有竞争力或竞争性或胜过几种强大的基于反向Prop的RL方法。
translated by 谷歌翻译
在人类中,感知意识促进了来自感官输入的快速识别和提取信息。这种意识在很大程度上取决于人类代理人如何与环境相互作用。在这项工作中,我们提出了主动神经生成编码,用于学习动作驱动的生成模型的计算框架,而不会在动态环境中反正出错误(Backprop)。具体而言,我们开发了一种智能代理,即使具有稀疏奖励,也可以从规划的认知理论中汲取灵感。我们展示了我们框架与深度Q学习竞争力的几个简单的控制问题。我们的代理的强劲表现提供了有希望的证据,即神经推断和学习的无背方法可以推动目标定向行为。
translated by 谷歌翻译
我们介绍了神经堆栈体系结构,包括一个可区分的参数化堆栈操作员,该堆栈操作员近似堆栈推送和弹出操作,以选择明确表示堆栈的参数选择。我们证明了这种堆栈体系结构的稳定性:在任意许多堆栈操作之后,神经堆栈的状态仍然与离散堆栈的状态非常相似。使用神经堆栈和复发性神经网络,我们引入了神经网络下降自动机(NNPDA),并证明具有有限/有界神经元的NNPDA可以模拟任何PDA。此外,我们扩展了建筑,并提出了新的建筑神经状态图灵机(NNTM)。我们证明,具有有界神经元的可区分NNTM可以实时模拟图灵机(TM)。就像神经堆栈一样,这些架构也很稳定。最后,我们扩展了构造,以表明可区分的NNTM等同于通用图灵机(UTM),并且只能使用\ textbf {七个有限/有限的精度}神经元模拟任何TM。这项工作为有界精度RNN的计算能力提供了新的理论界限,并随着内存增强。
translated by 谷歌翻译
在基于人工神经网络的终身学习系统中,最大的障碍之一是在遇到新信息时无法保留旧知识。这种现象被称为灾难性遗忘。在本文中,我们提出了一种新型的连接主义架构,即顺序的神经编码网络,在从数据点流中学习时忘记了,并且与当今的网络不同,它不会通过流行的错误反向传播来学习。基于预测性处理的神经认知理论,我们的模型以生物学上可行的方式适应了突触,而另一个神经系统学会了指导和控制这种类似皮层的结构,模仿了一些基础神经节的某些任务连续控制功能。在我们的实验中,我们证明了与标准神经模型相比,我们的自组织系统经历的遗忘大大降低,表现优于先前提出的方法,包括基于排练/数据缓冲的方法,包括标准(SplitMnist,SplitMnist,Split Mnist等) 。)和定制基准测试,即使以溪流式的方式进行了训练。我们的工作提供了证据表明,在实际神经元系统中模仿机制,例如本地学习,横向竞争,可以产生新的方向和可能性,以应对终身机器学习的巨大挑战。
translated by 谷歌翻译
Large pre-trained models, such as Bert, GPT, and Wav2Vec, have demonstrated great potential for learning representations that are transferable to a wide variety of downstream tasks . It is difficult to obtain a large quantity of supervised data due to the limited availability of resources and time. In light of this, a significant amount of research has been conducted in the area of adopting large pre-trained datasets for diverse downstream tasks via fine tuning, linear probing, or prompt tuning in low resource settings. Normalization techniques are essential for accelerating training and improving the generalization of deep neural networks and have been successfully used in a wide variety of applications. A lot of normalization techniques have been proposed but the success of normalization in low resource downstream NLP and speech tasks is limited. One of the reasons is the inability to capture expressiveness by rescaling parameters of normalization. We propose KullbackLeibler(KL) Regularized normalization (KL-Norm) which make the normalized data well behaved and helps in better generalization as it reduces over-fitting, generalises well on out of domain distributions and removes irrelevant biases and features with negligible increase in model parameters and memory overheads. Detailed experimental evaluation on multiple low resource NLP and speech tasks, demonstrates the superior performance of KL-Norm as compared to other popular normalization and regularization techniques.
translated by 谷歌翻译
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
translated by 谷歌翻译
End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only.
translated by 谷歌翻译
Our education system comprises a series of curricula. For example, when we learn mathematics at school, we learn in order from addition, to multiplication, and later to integration. Delineating a curriculum for teaching either a human or a machine shares the underlying goal of maximizing the positive knowledge transfer from early to later tasks and minimizing forgetting of the early tasks. Here, we exhaustively surveyed the effect of curricula on existing continual learning algorithms in the class-incremental setting, where algorithms must learn classes one at a time from a continuous stream of data. We observed that across a breadth of possible class orders (curricula), curricula influence the retention of information and that this effect is not just a product of stochasticity. Further, as a primary effort toward automated curriculum design, we proposed a method capable of designing and ranking effective curricula based on inter-class feature similarities. We compared the predicted curricula against empirically determined effectual curricula and observed significant overlaps between the two. To support the study of a curriculum designer, we conducted a series of human psychophysics experiments and contributed a new Continual Learning benchmark in object recognition. We assessed the degree of agreement in effective curricula between humans and machines. Surprisingly, our curriculum designer successfully predicts an optimal set of curricula that is effective for human learning. There are many considerations in curriculum design, such as timely student feedback and learning with multiple modalities. Our study is the first attempt to set a standard framework for the community to tackle the problem of teaching humans and machines to learn to learn continuously.
translated by 谷歌翻译
Stackelberg游戏模型,领导者致力于制定策略,而追随者最能做出响应,它发现了广泛的应用程序,特别是针对安全问题。在安全环境中,目标是为了保护某些资产,使领导者计算一个最佳策略。在许多这些应用程序中,追随者实用程序模型的参数尚不确定。分布式优化优化通过允许在可能的模型参数上进行分配来解决此问题,而该分布来自一组可能的分布。目的是最大程度地提高预期的效用,相对于最坏情况下的分布。我们启动了分配稳定模型的研究,以计算最佳策略。我们考虑了对追随者公用事业模型的不确定性的正常形式游戏的情况。我们的主要理论结果是表明,在各种不确定性模型中,始终存在分布稳定的stackelberg平衡。对于一组有限的追随者实用程序函数,我们提出了两种算法,用于计算使用数学程序的分布强烈的Stackelberg平衡(DRSSE)。接下来,在一般情况下,存在无限数量的可能的追随者实用程序功能,并且不确定性在有限支撑的名义分布周围由Wasserstein Ball表示,我们给出了一个增量的基于混合组合编程的算法来计算最佳的算法分配稳定的策略。实验证实了我们在经典的Stackelberg游戏中算法的障碍,这表明我们的进近范围扩展到中型游戏。
translated by 谷歌翻译
机器学习中的超参数优化通常是使用只会导致大约一组超参数的幼稚技术来实现的。尽管贝叶斯优化之类的技术在给定超参数的给定域进行了智能搜索,但不能保证最佳解决方案。大多数这些方法的一个主要缺点是用超参数数量增加其搜索域的指数增加,从而增加了计算成本并使方法缓慢。超参数优化问题本质上是双重优化任务,一些研究尝试了解决此问题的双重解决方案方法。但是,这些研究假设了一组独特的模型权重,可以最大程度地减少训练损失,这通常受到深度学习体系结构的影响。本文讨论了一种基于梯度的双层方法,该方法解决了这些缺点以解决超参数优化问题。所提出的方法可以处理我们在实验中选择正则化高参数的连续超参数。该方法保证了本研究已在理论上证明的一组最佳超参数的收敛。该想法基于使用高斯过程回归近似较低级别的最佳值函数。结果,使用增强拉格朗日方法解决的单个级别约束优化任务缩小为单个级别约束优化任务。我们已经对多层感知器和LENET架构进行了有关MNIST和CIFAR-10数据集的广泛计算研究,以证实该方法的效率。一项针对网格搜索,随机搜索,贝叶斯优化和Hyberband方法的比较研究表明,所提出的算法会收敛于较低的计算,并导致模型在测试集上更好地推广。
translated by 谷歌翻译